Tags: data science*

0 bookmark(s) - Sort by: Date ↓ / Title /

  1. This tutorial demonstrates how to perform semantic clustering of user messages using Large Language Models (LLMs) by prompting them to analyze publicly available Discord messages. It covers methods for data extraction, sentiment scoring, KNN clustering, and visualization, emphasizing faster and less effort-intensive processes compared to traditional data science approaches.

  2. A comprehensive guide to Large Language Models by Damien Benveniste, covering various aspects from transformer architectures to deploying LLMs.

    • Language Models Before Transformers
    • Attention Is All You Need: The Original Transformer Architecture
    • A More Modern Approach To The Transformer Architecture
    • Multi-modal Large Language Models
    • Transformers Beyond Language Models
    • Non-Transformer Language Models
    • How LLMs Generate Text
    • From Words To Tokens
    • Training LLMs to Follow Instructions
    • Scaling Model Training
    • Fine-Tuning LLMs
    • Deploying LLMs
  3. Hex introduces Advanced Compute Profiles for demanding workflows, offering more CPU, RAM, and GPUs. It also features Explore, a fast, flexible no-code data analysis tool. Hex emphasizes collaboration, AI integration, and a wide range of use cases including data science, operational reporting, and self-serve data tools.

    • TabPFN is a novel foundation model designed for small- to medium-sized tabular datasets, with up to 10,000 samples and 500 features.
    • It uses a transformer-based architecture and in-context learning (ICL) to outperform traditional gradient-boosted decision trees on these datasets.
  4. The article discusses methods for data scientists to answer 'what if' questions regarding the impact of actions or events without having conducted prior experiments. It focuses on creating counterfactual predictions using machine learning techniques and compares a proposed method with Google's Causal Impact. The approach involves using historical data and control groups to estimate the effect of modifications, addressing challenges such as seasonality, confounders, and temporal drift.

  5. The article explores 11 essential tips for leveraging the full potential of the Pandas library to boost productivity and streamline workflows in handling and analyzing complex datasets. It uses a real-world dataset from Kaggle's Airbnb listings to illustrate techniques such as chunked processing and parallel execution.

  6. Despite its power, partial correlation remains underrated in data science. This tool addresses the main limitation of simple correlation by accounting for the influence of other variables.

  7. This article provides an overview of feature selection in machine learning, detailing methods to maximize model accuracy, minimize computational costs, and introduce a novel method called History-based Feature Selection (HBFS).

  8. Mastering specific Pandas functions can enhance data manipulation skills for data scientists using Python, focusing on less explored methods for data transformation and analysis.

  9. An article discussing ten predictions for the future of data science and artificial intelligence in 2025, covering topics such as AI agents, open-source models, safety, and governance.

Top of the page

First / Previous / Next / Last / Page 1 of 0 SemanticScuttle - klotz.me: tagged with "data science"

About - Propulsed by SemanticScuttle